ECC polar coding and list decoding methods and codecs

ABSTRACT

A method of decoding data encoded with a polar code and devices that encode data with a polar code. A received word of polar encoded data is decoded following several distinct decoding paths to generate a list of codeword candidates. The decoding paths are successively duplicated and selectively pruned to generate a list of potential decoding paths. A single decoding path among the list of potential decoding paths is selected as the output and a single candidate codeword is thereby identified. In another preferred embodiment, the polar encoded data includes redundancy values in its unfrozen bits. The redundancy values aid the selection of the single decoding path. A preferred device of the invention is a cellular network device, (e.g., a handset) that conducts decoding in accordance with the methods of the invention.

PRIORITY CLAIM AND REFERENCE TO RELATED APPLICATION

The application claims priority under 35 U.S.C. §119 from priorprovisional application Ser. No. 61/670,381, which was filed Jul. 11,2012, which application is incorporated by reference herein.

STATEMENT OF GOVERNMENT INTEREST

This invention was made with government support under CCF-1116820awarded by National Science Foundation. The government has certainrights in the invention.

FIELD

A field of the invention is information coding and decoding. Theinvention is particularly useful for communication over noisy mediums.Example applications of the invention include communications (such aswireless communications including, e.g., cellular communications,satellite communications, and deep space communications) and datastorage (such as used in data storage devices, e.g., computer hard diskdevices).

BACKGROUND

Error-correcting codes are used whenever communication over ag15 noisymedium (channel) takes place. Cell phones, computer hard disks,deep-space communication and many other devices communicate over a noisymedium. Error-correcting codes have been widely used and improved since1948 as the search for optimal error correcting codes continued fordecades. The basic problem in decoding is attempting to recover anoriginal transmitted codeword from a received word that is a distortedversion of the original codeword. The distortion is introduced by thenoisy medium.

Polar codes were introduced in 2009. See, E. Arikan, “ChannelPolarization: A method for Constructing Capacity Achieving Codes forSymmetric Binary-Input Memoryless Channels,” IEEE Trans. Inform. Theory,vol. 55, pp. 3051-3073 (2009); E. Arikan and E. Telatar, “On the Rate ofChannel Polarization,” in Proc. IEEE Int'l Symp. Inform. Theory, Seoul,South Korea, pp. 1493-1495 (2009)

Polar codes were the first and presently remain the only family of codesknown to have an explicit construction (no ensemble to pick from) andefficient encoding and decoding algorithms, while also being capacityachieving over binary input symmetric memoryless channels.). A drawbackof existing polar codes to date is disappointing performance for shortto moderate block lengths. Polar codes have not been widely implementeddespite recognized inherent advantages over other coding schemes, suchas turbo codes, because channel polarization remains slow in priormethods.

List decoding was introduced in the 1950s. See, P. Elias, “List decodingfor noisy channels,” Technical Report 335, Research Laboratory ofElectronics, MIT (1957). List decoding addresses a worst case scenarioby outputting a small number of codewords that are a small distance fromthe code word. List decoding has not been widely used, however. Modernapplications of list decoding have sought to reduce worst-case decodingpenalties. Successive cancellation list decoding has been applied in toReed-Muller codes. See, I. Dumer and K. Shabunov, “Soft-decisionDecoding of Reed-Muller codes: Recursive Lists,” IEEE Trans. Inform.Theory, vol. 52, pp. 1260-1266 (2006). Reed-Muller codes are structureddifferently than polar codes and are widely considered in the art tohave a different decoding approach. Indeed, Arikan's original paper thatpresented polar codes emphasized differences between polar codes andReed-Muller codes. Phrasing of the decoding algorithms in Reed-Mullerand Polar codes makes comparison difficult. The present inventorsrecognized that Arikan's successive cancellation decoding is similar innature to the successive cancellation decoding of Reed-Muller codes asin Dumer-Shabnov. However, application of successive list decoding asset forth in Dumer-Shavanov would increase complexity of polar decodingto an extent that would make its application impractical. The successivecancellation list decoding in Dumer-Shabunov is also complex, and canlead to Ω(L·n²) complexity. As with prior list decoders, it will alsofail to produce a single output, instead producing a small list ofcandidates without a single, explicit codeword.

The observation that one can reduce the space complexity of successivecancellation decoders for polar codes with hardware architectures toO(n) was noted, in the context of VLSI design, by the present inventorsand colleagues in C. Leroux, I. Tal, A. Vardy, and W. J. Gross,“Hardware Architectures for Successive Cancellation Decoding of PolarCodes,” rXiv:1011.2919v1 (2010). This paper does not provide a differentdecoding approach for polar codes, but provides architectures that canreduce the space complexity for the decoding scheme that was provided byArikan with the introduction of polar codes.

SUMMARY OF THE INVENTION

An embodiment of the invention is a method of decoding data encoded witha polar code. A received word of polar encoded data is decoded followingseveral distinct decoding paths to generate codeword candidates. Thedecoding paths are selectively successively duplicated and pruned togenerate a list of potential decoding paths. A single decoding pathamong the list of potential decoding paths is selected as the output anda single candidate codeword is identified as the output. In anotherpreferred embodiment, the polar encoded data includes redundancy data inits unfrozen bits. The redundancy data aids the selection of the singledecoding path. A preferred device of the invention is a cellular networkdevice, e.g., a handset that conducts decoding in according with themethods of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows word error rate at a length n=2048 rate ½ polar codeoptimized for SNR=2 dB under various list sizes for a code constructionconsistent with I. Tal and A. Vardy, “How to construct polar codes,”submitted to IEEE Trans. Inform. Theory, available online asarXiv:1105.6164v2 (2011); with two dots representing upper and lowerbounds on the SNR needed to reach a word error rate of 10⁻⁵;

FIG. 2 shows a comparison of polar coding in according with presentembodiments and decoding schemes to an implementation of the WiMaxstandard from TurboBest, “IEEE 802.16e LDPC Encoder/Decoder Core.”[Online], with codes of rate ½, a length of the polar code is 2048, thelength of the WiMax code is 2034, a list size used was L=32, and a CRCuse was 16 bits long;

FIG. 3 shows a comparison of normalized rate for a wide class of codeswith a target word error rate is 10⁻⁴;

FIG. 4 shows decoding paths in accordance with the invention of unfrozenbits for L=4: with each level has at most 4 nodes with paths thatcontinue downward and discontinued paths being grayed out;

FIGS. 5A and 5B show word error rate of length n=2048 (FIG. 5A) andn=8192 (FIG. 5B) rate ½ polar code in accordance with the presentinformation optimized for SNR=2 dB under various list sizes; and thecode construction was carried out as in FIG. 1.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The present inventors have recognized that drawbacks in polar codes atshort to medium block lengths arises from inherent weaknesses of thecode itself at these lengths or from the fact that the successivecancellation (SC) decoder employed to decode them is significantlydegraded with respect to maximum likelihood (ML) decoding performance.These two possibilities are complementary, and so both may occur.

Disclosed are methods and their computer implementation that greatlyimprove the error-correcting performance of polar codes. Polar codes area family of error correcting codes that facilitate the transfer ofinformation from a transmitter to a receiver over a noisy medium (e.g.,as happens in cell phones, computer hard disks, deep-spacecommunication, etc). The invention employs a new decoding method forpolar codes as well as a modification of the codes themselves. Themethod has been fully implemented and tested. The resulting performanceis better than the current state-of-the-art in error-correction coding.

Preferred embodiments of the invention also provide a modified polarcode and decoding method. In prior polar coding methods, on thetransmitting end (encoding), a sequence of K information bits are mappedto a codeword of length n. In the preferred embodiment, k informationbits and r CRC (cyclic redundancy check) bits together constitute theK=k+r bits mapped to a codeword of length n. These bits are denoted asu_1,u_2, . . . , u_K. On the receiving end (decoding), instead of firstdecoding u_1 to either 0 or 1, then decoding u_2 to either 0 or 1, andso forth, what occurs is as follows. When decoding u_1, both the optionof it being a 0 and the option of it being a 1 are considered. These twooptions are termed “paths”. For each such path, both options of u_2 leadto 4 paths, and so forth.

An aspect of the invention provides an improvement to the SC decoder,namely, a successive cancellation list (SCL) decoder. The list decoderhas a corresponding list size L, and setting L=1 results in the classicSC decoder. While lists are used in the decoder when the algorithmexecutes, the decoder returns a single codeword. L is an integerparameter that can be freely chosen by a system designer. L=1 indicatesthe classic successive cancellation decoding (or Arikan). Higher valuesof L lead to better performance and higher complexity. In HI examplesused to test the invention, the highest value of L is 32, but muchhigher values are possible extending into tens of thousands. Inembodiments of the invention, L is the number of different decodingpaths after pruning. After duplication, that value is 2 L. In preferredembodiments, the pruning reduces the number of paths from 2 L to L.

Embodiments of the invention also include encoders and decoders for ECC(error corrected coded) polar code communications. A preferredembodiment decoder uses a theory that duplicates data structures used bya parent path each time a decoding path spits into two forks. Each forkreceives a copy. The complexity of making actual copies can grow thecost of copying quickly. To avoid this and provide reduced copyingexpense, namely, at each given stage, the same array may be flagged asbelonging to more than one decoding path. However, when a given decodingpath needs access to an array it is sharing with another path, a copy ismade.

Embodiments of the invention also include polar concatenated codesexecuted by computer hardware and or software that aid the decoding.Instead of setting all unfrozen bits to information bits to transmit, afollowing concatenation is applied. For some small constant r,embodiments of the invention set the first k−r unfrozen bits toinformation bits. The last r unfrozen bits will hold the r-bit CRC(cyclic redundancy code) value of the first k−r unfrozen bits. Thisprovides a reasonable penalty rate of (k−r)/n. During decoding, theconcantenation provides a shortcut to refine selection. A path for whichthe CRC is invalid can not correspond to the transmitted codeword. Thus,the selection can be refined as follows. If at least one path has acorrect CRC, then remove from the list all paths having incorrect CRCand then choose the most likely path. Otherwise, select the most likelypath in the hope of reducing the number of bits in error, but with theknowledge that at least one bit is in error.

Artisans will appreciate that preferred embodiments of the inventionefficiently decode polar codes by generating a list of candidatecodewords with successive cancellation decoding. In preferredembodiments, a codeword is selected from the list using an outer CRCcoded polar code provided by the invention.

In a preferred list decoder of the invention, up to L decoding paths areconsidered concurrently at each decoding stage. Simulation results showthat the resulting performance is very close to that of amaximum-likelihood decoder, even for moderate values of L. The presentlist decoder effectively bridges the gap between successive-cancellationand maximum-likelihood decoding of polar codes. The specificlist-decoding algorithm that achieves this performance doubles thenumber of decoding paths at each decoding step, and then uses thepruning procedure to discard all but the L best paths. The naturalpruning criterion can be easily evaluated. Nevertheless, astraightforward implementation still requires O(L n^2) time, which is instark contrast with the O(n log n) complexity of the originalsuccessive-cancellation decoder. The structure of polar codes is used toovercome this problem and provide an efficient, numerically stable,implementation taking only O(L n log n) time and O(L n) space. The polarcoding strategies of the invention achieve better performance with lowercomplexity. In the preferred embodiment list decoder, up to L decodingpaths are considered concurrently at each decoding stage. Then, a singlecodeword is selected from the list as output. If the most likelycodeword is selected, simulation results show that the resultingperformance is very close to that of a maximum-likelihood decoder, evenfor moderate values of L. Alternatively, if an intelligent selectionprocedure selects the codeword from the list, the results are comparableto the current state of the LDPC codes.

The preferred list decoder doubles the number of decoding paths at eachdecoding step, and then uses a pruning procedure to discard all but theL best paths. Nevertheless, a straightforward implementation stillrequires O(L·n) time, which is in stark contrast with the O(n log n)complexity of the original successive-cancellation decoder. Thestructure or polar codes is exploited by the invention with anefficient, numerically stable, implementation taking only O(L·n log n)time and O(L·n) space.

Those knowledgeable in the art will appreciate that embodiments of thepresent invention lend themselves well to practice in the form ofcomputer program products. Accordingly, it will be appreciated thatembodiments of the present invention may comprise computer programproducts comprising computer executable instructions stored on anon-transitory computer readable medium that, when executed, cause acomputer to undertake methods according to the present invention, or acomputer configured to carry out such methods. The executableinstructions may comprise computer program language instructions thathave been compiled into a machine-readable format. The non-transitorycomputer-readable medium may comprise, by way of example, a magnetic,optical, signal-based, and/or circuitry medium useful for storing data.The instructions may be downloaded entirely or in part from a networkedcomputer. Also, it will be appreciated that the term “computer” as usedherein is intended to broadly refer to any machine capable of readingand executing recorded instructions. It will also be understood thatresults of methods of the present invention may be displayed on one ormore monitors or displays (e.g., as text, graphics, charts, code, etc.),printed on suitable media, stored in appropriate memory or storage, etc.

Preferred embodiments of the invention will now be discussed withrespect to the experiments and results. Artisans will appreciateadditional features of the invention from the discussion of theexperimental results and example embodiments.

A preferred embodiment Successive Cancellation (SC) decoder is amodification to polar codec of Arikan. The basic decoder must first bedefined to explain the modifications.

Let the polar code under consideration have length n=2^(m) and dimensionk. Thus, the number of frozen bits is n−k. The reformulation denotes byu=(u_(i))_(i=0) ^(n−1)=u₀ ^(n−1) the information bits vector (includingthe frozen bits), and by c=c₀ ^(n−1) the corresponding codeword, whichis sent over a binary-input channel W:χ→γ, where χ={0, 1}. At the otherend of the channel, the received word is y=y₀ ^(n−1). A decodingalgorithm is then applied to y, resulting in a decoded codeword ĉ havingcorresponding information bits û.

A. An Outline of Successive Cancellation

A high-level description of the SC decoding algorithm is provided inAlgorithm I (see Algorithm section). In Algorithm I, at each phase φ ofthe algorithm, the pair of probabilities W_(m) ^(φ)(y₀ ^(n−1),û₀^(φ−1)|0) and W_(m) ^(φ)(y₀ ^(n−1),û₀ ^(φ−1)|1) is calculated. Then thevalue of û_(φ) is determined according to the pair of probabilities.

Probabilities are calculated as follows. For layer 0≦λ≦mΛ=2^(λ).  (1)For0≦φ<Λ,   (2)

bit channel W_(λ) ^(φ) is a binary input channel with output alphabetγ^(Λ)×χ^(φ), the conditional probability of which is generically denotedW_(λ) ^(φ)(y₀ ^(Λ−1),u₀ ^(φ−1)|u_(φ)).   (3)

In the present context, y₀ ^(Λ−1) is always a contiguous subvector ofreceived vector y. Next, for 1≦λ≦m. recall the recursive definition of abit channel (Provided in Equations (22) and (23) of E. Arikan, “ChannelPolarization: A method for Constructing Capacity Achieving Codes forSymmetric Binary-Input Memoryless Channels,” IEEE Trans. Inform. Theory,vol. 55, pp. 3051-3073 (2009): let 0≦2ψ<Λ, then

$\begin{matrix}{{\overset{\overset{{branch}\mspace{14mu}\beta}{︷}}{W_{\lambda}^{({2\psi})}\left( {y_{0}^{\Lambda - 1},{u_{0}^{{2\psi} - 1}❘u_{2\psi}}} \right)} = {\sum\limits_{u_{{2\psi} + 1}}\;{\frac{1}{2}{\underset{\underset{{branch}\mspace{14mu} 2\beta}{︸}}{W_{\lambda - 1}^{(\psi)}\left( {y_{0}^{{\Lambda/2} - 1},{{u_{0,{even}}^{{2\psi} - 1} \oplus u_{0,{odd}}^{{2\psi} - 1}}❘{u_{2\psi} \oplus u_{{2\psi} + 1}}}} \right)} \cdot \underset{\underset{{{branch}\mspace{14mu} 2\beta} + 1}{︸}}{W_{\lambda - 1}^{(\psi)}\left( {y_{\Lambda/2}^{\Lambda - 1},{u_{0,{odd}}^{{2\psi} - 1}❘u_{2_{\psi + 1}}}} \right)}}}}}\mspace{31mu}} & (4) \\{\mspace{79mu}{and}} & \; \\{\overset{\overset{{branch}\mspace{14mu}\beta}{︷}}{W_{\lambda}^{({{2\psi} + 1})}\left( {y_{0}^{\Lambda - 1},{u_{0}^{2\psi}❘u_{{2\psi} + 1}}} \right)} = \;{\frac{1}{2}{\underset{\underset{{branch}\mspace{14mu} 2\beta}{︸}}{W_{\lambda - 1}^{(\psi)}\left( {y_{0}^{{\Lambda/2} - 1},{{u_{0,{even}}^{{2\psi} - 1} \oplus u_{0,{odd}}^{{2\psi} - 1}}❘{u_{2\psi} \oplus u_{{2\psi} + 1}}}} \right)} \cdot \underset{\underset{{{branch}\mspace{14mu} 2\beta} + 1}{︸}}{W_{\lambda - 1}^{(\psi)}\left( {y_{\Lambda/2}^{\Lambda - 1},{u_{0,{odd}}^{{2\psi} - 1}❘u_{{2\psi} + 1}}} \right)}}}} & (5)\end{matrix}$

with “stopping condition” W₀ ⁽⁰⁾(y|u)=W(y|u).

For algorithm 1 to become concrete, it is necessary to specify how theprobability pair associated with W_(m) ^(φ) is calculated, and how thevalues of û, namely û₀ ^(φ−1), are propagated into those calculations.For λ>0 and 0≦φ<Λ, recall the recursive definition of W_(λ) ^(φ)(y₀^(Λ−1),u₀ ^(φ−1)|u_(φ)) given in either (4) or (5), depending on theparity of φ. For either φ=2ψ or φ=2ψ+1, the channel W_(λ−1) ^(φ) isevaluated with output (y_(Λ/2) ^(Λ−1),u_(0,even) ^(2ψ−1)⊕u_(0,odd)^(2ψ−1)) as wall as with output (y_(Λ/2) ^(Λ−1),u_(0,odd) ^(2ψ−1)).Preferred embodiments utilize these recursions. The output can bedefined simply to aid the analysis. This can be accomplished byspecifying, apart from the layer λ and the phase φ which define thechannel, the branch number0≦β<2^(m−λ)  (6)

Since, during the run of the SC algorithm, the channel W_(m) ^(φ) isonly evaluated with a single output (y₀ ^(Λ−1),u₀ ^(φ−1)|u_(φ)) andcorresponding branch number β=0 is assigned to each output. Next,proceed recursively as follows. For λ>0, consider a channel W_(λ) ^(φ),with output (y₀ ^(Λ−1),u₀ ^(φ−1)), a branch number β=0 is assigned toeach such output. Next proceed recursively as follows. For λ>0, considera channel W_(λ) ^(φ) with output (y₀ ^(Λ−1),u₀ ^(φ−1)) and correspondingbranch number β. Denote ψ=└φ/2┘. The output (y_(Λ/2) ^(Λ−1),u_(0,even)^(2ψ−1)⊕u_(0,odd) ^(2ψ−1)) will have a branch number of 2β+1. An outputcorresponding to branch β of a channel is introduced.

Embodiments of an invention define and use a first data structure. Foreach layer 0≦λ<m, a probabilities array is denoted by P_(λ), indexed byan integer 0≦i<2^(m) and a bit b ε {0, 1}. For a given layer λ, an indexi will correspond to a phase 0≦φ<Λ and a branch 0≦β<2^(m−λ) using thefollowing quotient/reminder representation.i=

φ,β

λ=φ+2^(λ)·β  (7)

To avoid repetition, the following shorthand is adoptedP_(λ)[

φ,β

]=P_(λ)[

φ,β

λ  (8)

The probabilities array data structure P_(λ) is used as follows. Letlayer 0≦λ<m, phase 0≦φ<Λ, and branch 0≦β<2^(m−λ) be given. Denote theoutput corresponding to branch β of W_(λ) ^(φ) as (y₀ ^(Λ−1),u₀ ^(φ−1)).Then ultimately, both values of b are obtained thatP _(λ)[

φ,β

][b]=W _(λ) ^(φ)(y ₀ ^(Λ−1) ,u ₀ ^(φ−1) |b)   (9)

The input corresponding to a branch can be defined via a similarterminology. Start at layer m and continue recursively. Consider thechannel W_(m) ^(φ), and let û_(φ) be the corresponding input whichAlgorithm 1 assumes. This input is assigned a branch number β=0. Proceedrecursively as follows. For layer λ>0, consider the channels W_(λ) ^(2ψ)and W_(λ) ^(2ψ+1) having the same branch β with corresponding inputsu_(2ψ) and u_(2ψ)+1, respectively. In view of (5) consider W_(λ−1) ^(ψ)and define the input corresponding to branch 2β as u_(2ψ)⊕u_(2ψ)+1.Under this recursive definition, for all 0≦λ<m, 0≦φ<Λ, and 0≦β<2^(m−λ),and input corresponding to branch β of W_(λ) ^(φ) is well defined.

The following lemma points at the natural meaning that a branch numberhas at layer λ=0. This can be proved using a straightforward induction.

Lemma 1: Let y and ĉ be as in Algorithm 1, the received vector and thedecoded codeword. Consider layer λ=0 and thus set φ=0. Next, fix abranch number 0≦β<2^(n). Then, the input and output corresponding tobranch β of W₀ ⁽⁰⁾ are y_(β) and ĉ_(β) respectively.

A second data structure is now introduced. For each layer 0≦λ<m, a bitarray, is denoted by B_(λ), and indexed by an integer 0≦i<2^(m), as in(7). The data structure can be used as follows. Let layer 0≦λ<m, phase0≦φ<Λ, and branch β of W_(λ) ^(φ) as u(λ,φ,β). Then ultimately,B _(λ)[

φ,β

=û(λ,φ,β),   (10)

which adopts the same shorthand as (8). The total memory consumed bythis algorithm is O(n log n). A preferred first implementation of the SCdecoder is given as Algorithms 2-4 (see Algorithm Section). The mainloop is given in Algorithm 2, and follows the high-level descriptiongiven in Algorithm 1. Note that the elements of the probabilities arraysP_(λ) and bit array B_(λ), start-out uninitialized, and becomeinitialized as the algorithm runs its course. The code to initialize thearray values is given in Algorithms 3 and 4.

Lemma 2: Algorithms 2-4 are a valid implementation of the SC decoder.

Proof: In addition to proving the claim explicitly stated in the lemma,the implicit claim can also be proven. Namely, the actions taken by thealgorithm should be shown to be well defined. This could be shown bydemonstrating that when an array element is read from, it was alreadywritten to (it is initialized).

Both the implicit and the explicit claims are easily derived from thefollowing observation. For a given 0≦φ<n, consider iteration φ of themain loop in Algorithm 2. Fix a layer 0≦λ<m, and a branch 0≦β<2^(m−λ).If a run of the algorithm is suspended just after the iteration ends,then (9) hold up with {acute over (φ)} instead of φ, for all

$0 \leq \overset{\prime}{\varphi} \leq \left\lfloor \frac{\varphi}{2^{m - \lambda}} \right\rfloor$

-   -   Similarly, (10) holds with all {acute over (φ)} instead of φ for        all

$0 \leq \overset{\prime}{\varphi} < \left\lfloor \frac{\varphi + 1}{2^{m - \lambda}} \right\rfloor$

-   -   The above observation is proved by induction on φ.

The running time of the known SC decoder is O(n log n) and theimplementation provided above is no exception. The space complexity ofthe present algorithm is O(n log n) as well. However, in the aboveobservation the space complexity can be reduced to O(n).

As a first step towards this end, consider the probability pair arrayP_(m). By examining the main loop in Algorithm 2, it is seen that whenit is currently at phase φ, then it will never again make of use P_(m)└

φ′,0

┘, for all φ′<φ. On the other hand, P_(m)└

φ″,0

┘ is uninitialized for all φ″>φ. Thus, instead of reading and writing toP_(m)

φ,0

┘, it is possible to essentially disregard the phase information, anduse only the first element P_(m)[0] of the array, discarding all therest. By the recursive nature of the polar codes, thisobservation—disregarding the phase information—can be exploited for ageneral layer λ as well. Specifically, for all 0≦λ<m, it is now possiblet define the number of elements in P_(λ) to be 2^(m−λ).

Accordingly,P_(λ)[

φ,β

] is replaced by P_(λ)[

β

].  (11)

Note that the total space needed to hold the P arrays has gone down fromO(n log n) to O(n). That is also desirable for the B arrays. However,the above implementation does not permit the phase to be disregarded, ascan be seen, for example, in line 3 of Algorithm 4. The solution is asimple renaming. As a first step, define for each 0≦λ≦m an array C_(λ)consisting of bit pairs and have length n/2. Next, let a genericreference of the form B_(λ)[

φ,β

] be replaced by C_(λ)[ψ+β·2^(λ−1)][φ mod 2], where ψ=└φ/2┘. Thisrenames the elements of B_(λ) as elements of C_(λ). It is now possibleto disregard the value of ψ and take note only of the parity of φ. Withone more substitution: replace every instance of C_(λ)[ψ+β·2^(λ−1)][φmod 2] by C_(λ)[β][φ mod 2], and resize each array with C_(λ) to have2^(m−λ) bit pairs. To sum up,B_(λ)[

φ,β

] is replaced by C_(λ)[β][φ mod 2].  (12)

A further reduction in space is possible: for λ=0, φ=0, and thus theparity of φ is always even. However, this reduction does not affect theasymptotic space complexity which is now indeed down to O(n). Therevised algorithm is given as Algorithms 5-7.

The above statements are also of use in analyzing the time complexity ofthe preferred embodiment list decoder.

A preferred embodiment is referred to as a successive cancellation list(SCL) decoder, and example decoding is shown in FIG. 4. The list decoderhas a parameter L, called the list size. Generally speaking, largervalues of L mean lower error rates but longer running times. In the mainloop of an SC decoder, each phase provides a decision on the value ofû_(φ). In the present SCL decoder, instead of deciding to set the valueof an unfrozen û_(φ) to either a 0 or a 1, the decoding path splits intotwo paths (see FIG. 4) to be examined. Paths must be pruned to limit themaximum number of paths allowed to the specified list size, L. A pruningcriterion is provided to keep the most likely paths at each stage. Asimple implementation of the pruning can proceed as follows. Each time adecoding path is split into two forks, the data structures used by the“parent” path are duplicated, with one copy given to the first fork andother to the second. Since the number of splits is Ω(L·n), and since thesize of the data structures used by each path is Ω(n), the copyingoperation alone would consume time Ω(L·n²). This running time is onlypractical for short codes. However, all known (to the present inventors)implementations of successive cancellation list have complexity at least(L·n²). Preferred embodiments of SCL decoding reduces time complexity toO(L·n log n) instead of Ω(L·n²).

Consider the P arrays of and recall that the size of P_(λ) isproportional to 2^(m−λ). Thus, the cost of copying P_(λ) growsexponentially small with λ. On the other hand, looking at the main loopof Algorithm 5 and unwinding the recursion, P_(λ) is accessed only every2^(m−λ) increments of φ. The bigger P_(λ) is, the less frequently it isaccessed. The same observation applies to the C arrays. This observationof the present inventors leads to the use of a “lazy-copy” operation inpreferred embodiments. Namely, at each given stage, the same array maybe flagged as belonging to more than one decoding path. However, when agiven decoding path needs access to an array it is sharing with anotherpath, a copy is made.

Low-level functions and data structures can provide the “lazy-copy”methodology. The formulation is kept simple for purposes of explanation,but artisan will recognize some clear optimizations. The following datastructures are defined and initialized in Algorithm 8.

Each path with have an index l, where 0≦l<L. At first, only one pathwill be active. As the algorithm runs its course, paths will changestates between “active” and “inactive” The inactivePathIndices stack(See, Section 10.1 of T. H. Cormen, C. E. Leiserson, R. L. Rivest, andC. Stein, “Introduction to Algorithms, 2nd ed. Cambridge, Mass.: The MITPress (2001)) will hold the indices of the inactive paths. This assumesthe “array” implementation of a stack, in which both “push” and “pop”operations take O(1) time and a stack of capacity L takes O(L) space.The activePath array is a Boolean array such that activePath[l] is trueif path l is active. Note that, essentially, both inactivePathIndicesand activePath store the same information. The utility of thisredundancy will be made clear shortly.

For every layer λ, there will be a “bank” of L probability-pair arraysfor use by the active paths. At any given moment, some these arraysmight be used by several paths, while others might not be used by anypath. Each such array is pointed by an element of arrayPointer_P.Likewise, there will be a bank of bit-pair arrays, pointed to byelements of arrayPointer_C.

The pathIndexToArrayIndex array is used as follows. For a given layer λand path index l, the probability-pair array and bit-pair arraycorresponding to layer λ of path l are pointed to byarrayPointer_P[λ][pathIndexToArrayIndex[λ][l]]andarrayPointer_C[λ][pathIndexToArrayIndex[λ][l]],respectively.

At any given moment, some probability-pair and bit-pair arrays from thebank might be used to multiple paths, while others may not be used byany. The value of arrayReferenceCount[λ][s] denotes the number of pathscurrently using the array pointed to by arrayPointer_P[λ][s]. This isalso the number of paths making use of arrayPointer_C[λ][s]. The index sis contained in the stack inactiveARrayIndieces[λ] ifarrayReferenceCount[λ][s] is zero.

With the data structures initialized, the low-level functions by whichpaths are made active and inactive can be stated. Start by reference toAlgorithm 9, by which the initial path of the algorithm is assigned andallocated. This serves to choose a path index l that is not currently inuse (none of them are), and mark it as used. Then, for each layer λ,mark (through pathIndexToArrayIndex) an index s such that botharrayPointer_P[λ][s] and arrayPointer_C[λ][s] are allocated to thecurrent path.

Algorithm 10 is used to clone a path—the final step before splittingthat path in two. The logic is very similar to that of Algorithm 9, butnow the two paths are made to share bit-arrays and probability arrays.

Algorithm 11 is used to terminate a path, which is achieved by markingit as inactive. After this is done, the arrays marked as associated withthe path are considered. Since the path is inactive, it is treated asnot having any associated arrays, and thus all the arrays that werepreviously associated with the path can have their reference countdecreased by one.

The goal of all previously discussed low-level functions was essentialto enable the abstraction implemented by the functions getArrayPointer_Pand getArrayPointer_C. The function getArrayPointer_P is called eachtime a higher-level function needs to access (either for reading orwriting) the probability-pair array associated with a certain path l andlayer λ. The implementation of getArrayPointer_P is provided inAlgorithm 12. There are two cases to consider: either the array isassociated with more than one path or it is not. If it is not, thennothing needs to be done, and a pointer can be returned to the array. Onthe other hand, if he array is shared a private copy is created for pathl, and a pointer is returned to that copy. This ensures that two pathswill never write to the same array. The function getArrayPointer_C isused in the same manner for bit-pair arrays, and has exactly the sameessential implementation.

The above implementation deliberately sacrifices speed for simplicity.Namely, each such function is called either before reading or writing toan array. A variation to optimize speed conducts the copy operation onlybefore writing.

This completes the definition of almost all low-level functions.Constraints that should be followed and what is expected if theseconstraints are met are provided next.

Definition 1 (Valid calling sequence): Consider a sequence (f_(t))_(t=0)^(T) of T+1 calls to the low-level functions implemented in Algorithms8-12. The sequence is considered valid if the following traits hold.

Initialized: The one and only index t for which f_(t) is equal tointializedDataStructures is t=0. The one and only index t for whichf_(t) is equal to assignIntialPath is t=1.

Balanced: For 1≦t≦T, denote the number of times the function clonePathwas called up to and including t as#_(clonePath) ^((t))=|{1≦i≦t:f _(i) is clonePath}|.

Define #_(killPath) ^((t)) similarly. Then for every 1≦t≦L, thealgorithm requires that1≦(1+#_(clonePath) ^((t))−#_(killpath) ^(t))≦L.   (13)

Active: A path l is active at the end of stage 1≦t≦T if the followingconditions hold. First, there exists an index 1≦i≦t for which f_(i) iseither clonePath with corresponding output l or assignIntialPath withoutput l. Second, there is no intermediate index i<j≦t for which f_(i)is killPath with input l. For each 1≦t≦T we require that f_(t+1) hasinput l, then l is active at the end of stage t.

Lemma 3: Let (f_(t))_(t=0) ^(T) be a valid sequence of calls to thelow-level function implemented in Algorithms 8-12. Then, the run is welldefined: i) A “pop” operation is never carried out on an empty stack,ii) a “push” operation never results in a stack with more than Lelements, and iii) a “read” operation from any array defined in lines2-7 of Algorithm 8 is always preceded by a “write” operation to the samelocation in the array.

Proof: The proof reduces to proving the following four statementsconcurrently for the end of each step 1≦t≦T, by induction on t.

I A path index l is active by Definition 1 if activePath[l] is true ifinactivePathIndices does not contail the index l.

II The bracketed expression in (13) is the number of active paths at theend of stage t.

III The value of arrayReferenceCount[λ][s] is positive if the stackinactiveArrayIndices[λ] does not contain the index s, and is zerootherwise.

IV The value of arrayReferenceCount[λ][s] is equal to the number ofactive paths l for which pathIndexArrayIndex[λ][l]=s.

Before completing formalization of the utility of the low-levelfunctions, the concept of a descendant path needs to be specified. Let(f_(t))_(t=0) ^(T) be a valid sequence of calls. Next let l be an activepath index at the end of stage≦t≦T. Henceforth, abbreviate the phrase“path index l at the end of stage t” by “[l,t]”. [l′,t+1] is a child of“[l,t] if i) l′ is active at the end of stage t+1, and ii) either l′=lor f_(t+1) was the clonePath operation with input l and output l′.Likewise, [l′,t′] is a descendant of [l,t] if 1≦t≦,t′] and there is a(possibly empty) hereditary chain.

The definition of a valid function calling sequence can now be broadenedby allowing reads and writes to arrays.

Fresh pointer: consider the case where t>1 and f_(t) is either thegetArrayPointer_P or getArrayPointer_C function with input (λ,l) andoutput p. Then, for valid indices I, allow read and write operations top[i] after stage t but only before ant stage t′>t for which f_(t′) iseither clonePath or killPath.

Informally, the following lemma states that each path effectively sees aprivate set of arrays.

Lemma 4: Let (f_(t))_(t=0) ^(T) be a valid sequence of calls to thelow-level functions implemented in Algorithms 8-12. Assume theread/write operations between stages satisfy the “fresh pointer”condition.

Let the function f_(t) be getArrayPointer_P with input (λ,l′) and outputp. Similarly, for stage t′>t, let f_(t′) be getArrayPointer_P with input(λ,l′) and output p′. Assume that [l′,t′] is a descendant of [l,t].

Consider a “fresh pointer” write operation to p[i]. Similarly, considera “fresh pointer” read operation from p′[i] carried out after the“write” operation. Then assuming no intermediate “write” operations ofthe above nature, the value written is the value read.

A similar claim holds for getArrayPointer_C.

Proof: With the observations made in the proof Lemma 3 at hand, a simpleinduction on t is all that is needed.

The function pathIndexInactive given in Algorithm 13 is simply ashorthand, meant to help readability.

B. Mid-Level Functions

Algorithms 14 and 15 are modified implementations of Algorithms 6 and 7,respectively, for the list decoding setting.

These implementations of the preferred embodiment loop over all the pathindices l. Thus, the implementations make use of the functionsgetArrayPointer_P and getArrayPointer_C in order to assure that theconsistency of calculations is preserved, despite multiple paths sharinginformation. In addition, Algorithm 6 contains code to normalizeprobabilities. The normalization retained for a technical reason (toavoid floating-point underflow), and will be expanded on shortly.

Note that the “fresh pointer” condition imposed indeed holds. To seethis, consider first Algorithm 14. The key point to note is that neitherthe killPath nor the clonePath function is called from inside thealgorithm. The same observation holds for Algorithm 15. Thus the “freshpointer” condition is met, and Lemma 4 holds.

Consider next the normalization step carried out in lines 21-27 ofAlgorithm 14. Recall that a floating-point variable cannot be used tohold arbitrarily small positive reals, and in a typical implementation,the result of a calculation that is “too small” will be rounded to 0.This scenario is called an “underflow”.

Previous implementations of SC decoders were prone to “underflow”. Tosee this, consider line 1 in the outline implementation given inAlgorithm 2. Denote by Y and U the random vectors corresponding to y andu, respectively. For ε {0,1},W _(m) ^((φ))(y ₀ ^(n−1) ,û ₀ ^(φ−1) |b)=2·

(Y ₀ ^(n−1) =y ₀ ^(n−1) ,U ₀ ^(φ−1) =û ₀ ^(φ−1) ,U _(φ) =b)≦2·

(U ₀ ^(φ−1) =û ₀ ^(φ−1) ,U _(φ) =b)=2^(−φ).

Recall that φ iterates from 0 to n−1. Thus, for codes having lengthgreater than some small constant, the comparison in line 1 of Algorithm2 ultimately becomes meaningless, since both probabilities are roundedto 0.

Preferred embodiments provide a fix to this problem. After theprobabilities are calculated in lines 5-20 of Algorithm 14, normalizethe highest probability to be 1 in lines 21-27. The correction does notguarantee in all circumstances that underflows will not occur. However,the probability of a meaningless comparison due to underflow will beextremely low.

Apart from minimizing risk of overflows, normalization does not alterthe algorithm. The following lemma formalizes this claim.

Lemma 5: Assume “perfect” floating-point numbers. That is,floating-point variables are infinitely accurate and do not suffer fromunderflow/overflow. Next, consider a variant of Algorithm 14, termedAlgorithm 14′, in which just before line 21 is first executed, thevariable σ is set to 1. That is effectively, there is no normalizationof probabilities in Algorithm 14′.

Consider two runs, one of Algorithm 14 and one of Algorithm 14′. In bothruns, the input parameters to both algorithms are the same. Moreover,assume that in both runs, the state of the auxiliary data structures isthe same, apart from the following.

Recall that the present algorithm is recursive, and let λ₀ be the firstvalue of the variable λ for which line 5 is executed. That is, λ₀ is thelayer in which (both) algorithms do not perform preliminary recursivecalculations. Assume at this base stage λ=λ₀, the following holds: thevalues read from P_(λ−1) in lines 15 and 20 in the run of Algorithm 14are a multiple by α_(λ−1) of the corresponding values read in the run ofAlgorithm 14′. Then, for every λ≧λ₀, there exists a constant α_(λ) suchthat values written to P_(λ) in line 27 in the run of Algorithm 14 are amultiple by α_(λ) of the corresponding values written by Algorithm 14′.

Proof: For the base case λ=λ₀ inspections shows that the constant α_(λ)is simply (α_(λ−1))², divided by the value of σ after the main loop hasfinished executing in Algorithm 14. The claim for a general λ follows byinduction.

C. High-Level Functions

Consider the topmost function, the main loop given in algorithm 16. Line1 and 2, provide that the condition “initialized” in Definition 1 issatisfied. Also, for the inductive basis, the condition “balanced” holdsthat for t=1 at the the end of line 2. Next, notice that lines 3-5 arein accordance with the “fresh pointer” condition.

The main loop, lines 6-13, is the analog of the main loop in Algorithm5. After the main loop has finished, the algorithm selects (in lines14-16) the most likely codeword from the list and returns it.

Algorithms 17 and 18 are now introduced. Algorithm 17 is the analog ofline 6 in Algorithm 5, applied to active paths.

Algorithm 18 is the analog of lines 8-11 in Algorithm 5. However, now,instead of choosing the most likely fork out of 2 possible forks, it istypical to need to choose the L most likely forks out of 2 L possibleforks. The most interesting line is 14, in which the best ρ forks aremarked. Surprisingly, this can be done in O(L) time (See, Section 9.3 ofT. H. Cormen, et al., “Introduction to Algorithms, 2nd ed. Cambridge,Mass.: The MIT Press (2001)). The O(L) time result rather theoretical.Since L is typically a small number, the fastest way to achieve theselection goal would be through simple sorting. After the forks aremarked, first kill the path for which both forks are discontinued, andthen continue paths for which one or both are the forks are marked. Incase of the latter, the path is first split. The procedure first killspaths and only then splits paths in order for the “balanced” constraint(13) to hold. This provides a limit of L active paths at a time.

A primary function of Algorithm 18 is to prune the list and leave onlythe L “best” paths. This pruning is performed using the accumulatedlikelihood of each path. The accumulated likelihood is stored in the“probForks” array of Algorithm 18. The selection of the L “best” pathsis conducted on line 14 of Algorithm 18. Selection of the L “best” pathsis indeed achieved, in the following sense. At stage φ to rank eachpatch according to the probabilityW_(m) ^(φ)(y₀ ^(n−1),û₀ ^(φ−1)|û_(φ)).

By (9) and (11), this would indeed by the if the floating pointvariables were “perfect”, and the normalization step in lines 21-27 ofAlgorithm 14 were not carried out. By Lemma 5 this is still the case ifnormalization is carried out.

In Algorithm 19, the most probable path is selected from the final list.As before, by (9)-(12) and Lemma 5, the value of P_(m)[0][C_(m)[0][1]]is simply

${{W_{m}^{({n - 1})}\left( {y_{0}^{n - 1},{{\hat{u}}_{0}^{n - 2}❘{\hat{u}}_{n - 1}}} \right)} = {\frac{1}{2^{n - 1}} \cdot {P\left( {y_{0}^{n - 1}❘{\hat{u}}_{0}^{n - 1}} \right)}}},$

up to normalization constant.

A proof of space and time complexity follows.

Theorem 6: The space complexity of the SCL decoder is O(L·n).

Proof: All the data structures of the list decoder are allocated by inAlgorithm 8, and it can be checked that the total space used by them isO(L·n). Apart from these, the space complexity needed in order toperform the selection operation in line 14 of Algorithm 18 is O(L).Lastly, the various local variables needed by the algorithm take O(1)space, and the stack needed in order to implement the recursion takesO(log n) space.

Theorem 7: the running time of the SCL decoder is O(L·n log n).0

Proof: Recall that m=log n. The following bottom-to-top table summarizesthe running time of each function. The notation O_(Σ) will be explainedafter.

function running time initializeDataStructures( ) O(L · m)assignInitialPath( ) O(m) clonePath(l) O(m) killPath(l) O(m)getArrayPointer_P(λ, l) O(2^(m−λ)) qetArrayPointer_C(λ, l) O(2^(m−λ))pathIndexInactive(l) O(1) recursivelyCalcP(m, •) O_(Σ)(L · m · n)recursivelyUpdateC(m, •) O_(Σ)(L · m · n) continuePaths_FrozenBit(φ)O(L) continuePaths_FrozenBit(φ) O(L · m) findMostProbablePath O(L) SCLdecoder O(L · m · n)

The first 7 functions in the table, the low-level functions, are easilychecked to have the stated running time. Note that the running time ofgetArrayPointer_P and getArrayPointer_C is due to the copy operation inline 6 of Algorithm 6 applied to an array of size O(2^(m−λ)). Thus, aswe previously mentioned, reducing the size of the arrays has helped toreduce the running time of the list decoding algorithm.

Next, consider the two mid-level functions, namely, recusivelyCalcP andrecursivelyUpdateC. The notationrecursivelyCalcP(m,·) ε O_(Σ)(L·m·n)

means that total running time of the n function callsrecursivelyCalcP(m,φ),0≦φ<2′

is O(L·m·n). To see this, denote by f(λ) the total running time of theabove with m replaced by λ. By splitting the running time of Algorithm14 into a non-recursive part and a recursive part for λ>0f(λ)=2^(λ) ·O(L·2^(m−λ))+f(λ−1).

Thus, it follows thatf(m) ε O(L·m·2^(m))=O(L·m·n).

In essentially the same way it can be proven that the total running timeof the recursivelyUpdateC(m, φ) over all 2^(n−1) valid (odd) values of φis O(m·n). Note that the two mid-level functions are invoked in lines 7and 13 of Algorithm 16, on all valid inputs.

The running time of the high-level functions is easily checked to agreewith the table.

Modified Polar Codes

The plots in FIGS. 5A and 5B were obtained by simulation. Theperformance of the decoder for various list sizes is given by the solidlines in the figure. As expected, as the list size L increases, theperformance of the decoder improves.

A diminishing-returns phenomenon is noticeable in terms of increasinglist size. The reason for this turns out to be simple.

The dashed line, termed the “ML bound” was obtained as follows. Duringsimulations for L=32, each time a decoding failure occurred, a check wasconducted to see whether the decoded codeword was more likely than thetransmitted codeword. That is true whether W(y|ĉ)>W(y|c). If so, thenthe optimal ML decoder would surely misdecode y as well. The dashed linerecords the frequency of the above event, and is thus a lower-bound onthe error probability of the ML decoder. Thus, for an SNR value greaterthan about 1.5 dB, FIG. 1 suggests an essentially optimal decoder isprovided when L=32.

Better performance seems unlikely at least for the region in which thedecoder is essentially optimal. However, a modified polar code of apreferred embodiment dramatically improved performance can be achieved.

During simulations, when a decoding error occurred, the pathcorresponding to the transmitted codeword was often a member of thefinal list. However, since there was a more likely path in the list, thecodeword corresponding tot that path was returned, which resulted in adecoding error. Thus, intelligent selection at the final stage canspecify which path to pick from the list, then the performance of thedecoder can be improved.

Such intelligent selection can be implemented with preferred embodimentsthat provide a modified polar code. Recall that there are k unfrozenbits that are free to be set. Instead of setting all of them toinformation bits to be transmitted, a concatenation scheme is employed.For some small constant r, set the first k−r unfrozen bits toinformation bits. The last r bits will hold the r-bit CRC (SeeSection8.8 of W. W. Peterson and E. J. Weldon, Error-Correcting Codes,2nd ed. Cambridge, Mass.: The MIT Press, 1972) value of the first k−runfrozen bits. A binary linear code having a corresponding k×rparity-check matrix constructed as follows will perform well. Le thefirst k−r columns be chosen at random and the last r columns be equal tothe identity matrix. This concantated encoding is a variation of thepolar coding scheme that provides an important functionality forintelligent selection, while being minor from the perspective of thepolar code structure. The concentration incurs a penalty in rate, sincethe rate of the code is now (k−r)/n instead of the previous k/n.

What is gained is an approximation to a perfect selector at the finalstage of decoding, instead of calling the function findMostProbablePathin Algorithm 19, do the following. A path for which the CRC is invalidcan not correspond to the transmitted codeword. Thus, refine theselection as follows. If at least one path has a correct CRC, thenremove from the list all paths having incorrect CRC, and then choose themost likely path. Otherwise, select the most likely path in the hope ofreducing the number of bits in error, but with the knowledge that thereis at least one bit in error.

FIGS. 1 and 2 contain a comparison of decoding performance between theoriginal polar codes and the preferred concantated polar codes of theinvention. A further improvement in bit-error-rate (but not inblock-error-rate) can be obtained when the decoding is performedsystematically as in E. Arikan, “Systematic polar coding,” IEEE Commmun.Lett., vol. 15, pp. 860-862, (2011).

Advantageously, when the preferred algorithm finishes it outputs asingle codeword. In addition, its performance approaches an ML decodereven with modest L values.

The solid line in FIG. 1 correspond to choosing the most likely codewordfrom the list as the decoder input. As can be seen, this choice of themost likely codeword results in a large range in which the presentalgorithm has performance very close to that of the ML decoder, even formoderate values of L. Thus, the sub-optimality of the SC decoder indeeddoes play a role in the disappointing performance of polar codes.

The invention also shows that polar-codes themselves are weak. Insteadof picking the most likely codeword from the list, an intelligentselector can select the codeword in the list that was the transmittedcodeword (if the transmitted codeword was indeed present in the list).Implementing such an intelligent selector turns out to be a minorstructural change of the polar code with a minor penalty in preferredembodiments, and entails a modification of the polar code. With thismodification, the performance of polar codes is comparable to state ofthe art LDPC codes, as can be seen in FIG. 2.

FIG. 3 shows that there are LDPC codes of length 2048 and rate ½ withbetter performance that the present polar codes. However, to the best ofour knowledge, for length 1024 and rate ½, the present implementation isslightly better than previously known codes when considering a targeterror rate probability of 10⁻⁴.

While specific embodiments of the present invention have been shown anddescribed, it should be understood that other modifications,substitutions and alternatives are apparent to one of ordinary skill inthe art. Such modifications, substitutions and alternatives can be madewithout departing from the spirit and scope of the invention, whichshould be determined from the appended claims.

Various features of the invention are set forth in the appended claims.

Algorithms

Algorithm 1: A high-level description of the SC decoder Input: thereceived vector y Output: a decoded codeword ĉ  1 for φ = 0, 1, . . . ,n − 1 do  2 | calculate W_(m) ^((φ)) (y₀ ^(n−1), û₀ ^(φ−1)|0) and W_(m)^((φ)) (y₀ ^(n−1), û₀ ^(φ−1)|1)  3 | if û_(φ) is frozen then  4 | | setû_(φ) to the frozen value of u_(φ)  5 | else  6 | | if W_(m) ^((φ)) (y₀^(n−1), û₀ ^(φ−1)|0) > W_(m) ^((φ))(y₀ ^(n−1), û₀ ^(φ−1)|1) then  7 | || set û_(φ) ← 0  8 | | else  9 | | |_(—) set û_(φ) ← 1 | |_(—) |_(—) 10return the codeword ĉ corresponding to û

Algorithm 2: First implementation of SC decoder Input: the receivedvector y Output: a decoded codeword ĉ  1 for β = 0, 1, . . . , n − 1 do// Intialization  2 |_(—) P₀[ 

 0, β 

 ][0] ← W(y_(β)|0), P₀[ 

 0, β 

 ][1] ← W(y_(β)|1)  3 for φ = 0, 1, . . . , n − 1 do // Main loop  4 |recursivelyCalcP(m, φ)  5 | if u_(φ) is frozen then  6 | | set B_(m)[ 

 φ, 0 

 ] to the frozen value of u_(φ)  7 | else  8 | | if P_(m)[ 

 φ, 0 

 ][0] > P_(m)[ 

 φ, 0 

 ][1] then  9 | | | set B_(m)[ 

 φ, 0 

 ] ← 0 10 | | else 11 | | |_(—) set B_(m)[ 

 φ, 0 

 ] ← 1 | |_(—) 12 | if φ mod 2 = 1 then 13 | |_(—) recursivelyUpdateB(m,φ) |_(—) 14 return the decoded codeword: ĉ = (B₀[ 

 0, β 

 ])_(β=0) ^(n−1)

Algorithm 3: recursivelyCalcP(λ, φ) implementation I Input: layer λ andphase φ  1 if λ = 0 then return // Stopping condition  2 set ψ ← [φ/2]// Recurse first, if needed  3 if φ mod 2 = 0 then recursivelyCalcP(λ −1, ψ)  4 for β = 0, 1, . . . , 2^(m−λ) − 1 do // calculation  5 | if φmod 2 = 0 then // apply Equation (4)  6 | | for u′ ∈ {0, 1} do  7 | | |P_(λ)[ 

 φ, β 

 ][u′] ← Σ_(u″) ½P_(λ−1)[ 

 ψ, 2β 

 ][u′ ⊕ u″] ·  8 | | |_(—) P_(λ−1)[ 

 ψ, 2β + 1 

 ][u′″]  9 | else // apply Equation (5) 10 | | set u′ ← B_(λ)[ 

 φ − 1, β 

 ] 11 | | for u″ ∈ {0, 1} do 12 | | | P_(λ)[ 

 φ, β 

 ][u″] ← ½P_(λ−1)[ 

 ψ, 2β 

 ][u′ ⊕ u″] · 13 | | |_(—) P_(λ−1)[ 

 ψ, 2β + 1 

 ][u″] | |_(—) |_(—)

Algorithm 4: recursivelyUpdateB(λ, φ) implementation I Require : φ isodd 1 set ψ ← [φ/2] 2 for β = 0, 1, . . . , 2^(m−λ) − 1 do 3 | B_(λ−1)[ 

 ψ, 2β 

 ] ← B_(λ)[ 

 φ − 1, β 

 ] ⊕ B_(λ)[ 

 φ, β 

 ] 4 |_(—) B_(λ−1)[ 

 ψ, 2β + 1 

 ] ← B_(λ)[ 

 φ, β 

 ] 5 if ψ mod 2 = 1 then 6 |_(—) recursivelyUpdateB(λ − 1, ψ)

Algorithm 5: Space efficient SC decoder, main loop Input: the receivedvector y Output: a decoded codeword ĉ  1 for β = 0, 1, . . . , n − 1 do// Initialization  2 |_(—) set P₀[β][0] ← W(y_(β)|0), P₀[β][1] ←W(y_(β)|1)  3 for φ = 0, 1, . . . , n − 1 do // Main loop  4 |recursivelyCalcP(m, φ)  5 | if u_(φ) is frozen then  6 | | setC_(m)[0][φ mod 2] to the ferozen value of u_(φ)  7 | else  8 | | ifP_(m)[0][0] > P_(m)[0][1] then  9 | | | set C_(m)[0][φ mod 2] ← 0 10 | |else 11 | | |_(—) set C_(m)[0][φ mod 2] ← 1 | |_(—) 12 | if φ mod 2 = 1then 13 | |_(—) recursivelyUpdateC(m, φ) |_(—) 14 return the decodedcodeword: ĉ = (C₀[β][0])_(β=0) ^(n−1)

Algorithm 6: recursivelyCalcP(λ, φ) space-efficient Input: layer λ andphase φ  1 if λ = 0 then return // Stopping condition  2 set ψ ← [φ/2]// Recurse first, if needed  3 if φ mod 2 = 0 then recursivelyCalcP(λ −1, ψ) // Perform the calculation  4 for β = 0, 1, . . . , 2^(m−λ) − 1 do 5 | if φ mod 2 = 0 then // apply Equation (4)  6 | | for u′ ∈ {0, 1} do 7 | | | P_(λ)[β][u′] ← | | |_(—) Σ_(u″) ½P_(λ−1)[2β][u′ ⊕ u″] ·P_(λ−1)[2β + 1][u″]  8 | else // apply equation (5)  9 | | set u′ ←C_(λ)[β][0] 10 | | for u″ ∈ {0, 1} do 11 | | |_(—) P_(λ)[β][u″] ←½P_(λ−1) [2β][u′⊕u″]·P_(λ−1)[2β+1][u″] | |_(—) |_(—)

Algorithm 7: recursivelyUpdateC(λ, φ) space-efficient Input: layer λ andphase φ Require: φ is odd 1 set ψ ← [φ/2] 2 for β = 0, 1, . . . ,2^(m−λ) − 1 do 3 | C_(λ−1)[2β][ψ mod 2] ← C_(λ)[β][0] ⊕ C_(λ)[β][1] 4 |C_(λ−1)[2β + 1][ψ mod 2] ← C_(λ)[β][1] 5 if ψ mod 2 = 1 then 6 |_(—)recursivelyUpdateC(λ − 1, ψ)

Algorithm 8: initiaizeDataStructures( )  1 inactivePathIndices ← newstack with capacity L  2 activePath ← new boolean array of size L  3arrayPointer_P ← new 2-D array of size (m + 1) × L, the elements ofwhich are array pointers  4 array Pointer_C ← new 2-D array of size(m + 1) × L, the elements of which are array pointers  5pathIndexToArrayIndex ← new 2-D array of size (m + 1) × L  6inactiveArrayIndices ← new array of size m + 1, the elements of whichare stacks with capacity L  7 arrayReferencsCount ← new 2-D array ofsize (m + 1) × L // Initialization of data structures  8 for λ = 0, 1, .. . , m do  9 | for s = 0, 1, . . . , L − 1 do 10 | |arrayPointer_P[λ][s] ← new array of float pairs of | | size 2^(m−λ) 11 || arrayPointer_C[λ][s] ← new array of bit pairs of size | | 2^(m−λ) 12 || arrayReferenceCount[λ][s] ← 0 13 | |_(—) push(inactiveArrayIndices[λ],s) |_(—) 14 for l = 0, 1, . . . , L − 1 do 15 | activePath[l] ← false 16|_(—) push(inactivePathIndices, l)

Algorithm 9: assignInitialPath( ) Output: index l of initial path 1 l ←pop(inactivePathIndices) 2 active-Path[l] ← true // Associate arrayswith path index 3 for λ = 0, 1, . . . , m do 4 | s ←pop(inactiveArrayIndices[λ]) 5 | pathIndexToArrayIndex[λ][l] ← s 6 |_(—)arrayReferenceCount[λ][s] ← 1 7 return l

Algorithm 10: clonePath(l) Input: index l of path to clone Output: indexl′ of copy 1 l′ ← pop(inactivePathIndices) 2 activePath[l′] ← true //Make l′ referencc same arrays as l 3 for λ = 0, 1, . . . , m do 4 | s ←pathIndexToArrayIndex[λ][l] 5 | pathIndexToArrayIndex[λ][l′] ← s 6 |_(—)arrayReferenceCount[λ][s]++ 7 return l′

Algoiittim 11: killPath(l) Input: index l of path to kill // Mark thepath index l as inactive 1 activePath[l] ← false 2 push(inactivePathIndices,l) // Disassociate arrays with path index 3 for λ = 0, 1, .. . , m do 4 | s ← pathIndexToArrayIndex[λ][l] 5 |arrayReferenceCount[λ][s] −− 6 | if arrayReferenceCount[λ][s] = 0 then 7| |_ push(inactiveArrayIndices[λ], s) |_(—)

Algorithm 12: getArrayPointer_P(λ, l) Input: layer λ and path index lOutput: pointer to corresponding probability pair array //getArrayPointer_C(λ, l) is defined identically, up to the obviouschanges in lines 6 and 10  1 s ← pathIndexToArrayIndex[λ][l]  2 ifarrayReferenceCount[λ][s] = 1 then  3 | s′ ← s  4 else  5 | s′ ←pop(inactiveArrayIndices[λ])  6 | copy the contents of the array pointedto by | arrayPointer_P[λ][s] into that pointed to by |arrayPointer_P[λ][s′]  7 | arrayReferenceCount[λ][s]−−  8 |arrayReferenceCount[λ][s′] ← 1  9 |_(—) pathIndexToArrayIndex[λ][l] ← s′10 return arrayPointer_P[λ][s′]

Algorithm 13: pathIndexInactive(l) Input: path index l Output: true ifpath l is active, and false otherwise 1 if activePath[l] = true then 2 |return false 3 else 4 |_(—) return true

Algorithm 14: recursivelyCalcP(λ, φ) list version Input: layer λ andphase φ  1 if λ = 0 then return // stopping condition  2 set ψ ← [φ/2]// Recurse first, if needed  3 if φ mod 2 = 0 then recursivelyCalcP(λ −1, ψ) // Perform the calculation  4 σ ← 0  5 for l = 0, 1, . . . , L − 1do  6 | if pathIndexInactive(l) then  7 | |_(—) continue  8 | P_(λ) ←getArrayPointer_P(λ, l)  9 | P_(λ−1) ← getArrayPointer_P(λ − 1, l) 10 |C_(λ) ← getArrayPointer_C(λ, l) 11 | for β = 0, 1, . . . , 2^(m−λ) − 1do 12 | | if φ mod 2 = 0 then | | | // apply Equation (4) 13 | | | foru′ ∈ {0, 1} do 14 | | | | Pλ[β][u′] ← | | | | Σ_(u″) ½P_(λ−1)[2β][u′ ⊕u″] · P_(λ−1)[2β + 1][u″] 15 | | | |_(—) σ ← max (σ, P_(λ)[β][u′]) 16 || else // apply equation (5) 17 | | | set u′ ← C_(λ)[β][0] 18 | | | foru″ ∈ {0, 1} do 19 | | | | Pλ[β][u″] ← | | | | ½ P_(λ−1)[2β][u′ ⊕ u″] ·P_(λ−1)[2β + 1][u″] 20 | | | |_(—) σ ← max(σ, P_(λ)[β][u″]) | | |_(—) ||_(—) |_(—) // normalize probabilities 21 for l = 0, 1, . . . , L − 1 do22 | if pathIndexInactive(l) then 23 | |_(—) continue 24 | P_(λ) ←getArrayPointer_P(λ, l) 25 | for β = 0, 1, . . . , 2^(m−λ) − 1 do 26 | |for u ∈ {0, 1} do 27 | | |_(—) P_(λ)[β][u] ← Pλ[β][u]/σ | |_(—) |_(—)

Algorithm 15: recursivelyUpdateC(λ, φ) list version Input: layer λ andphase φ Require: φ is odd  1 set C_(λ) ← getArrayPointer_C(λ, l)  2 setC_(λ−1) ← getArrayPointer_C(λ − 1, l)  3 set ψ ← [φ/2]  4 for l = 0, 1,. . . , L − 1 do  5 | if pathIndexInactive(l) then  6 | |_(—) continue 7 | for β = 0, 1, . . . , 2^(m−1) − 1 do  8 | | C_(λ−1)[2β][ψ mod 2] ←C_(λ)[β][0] ⊕ C_(λ)[β][1]  9 | |_(—) C_(λ−1)[2β + 1][ψ mod 2]←C_(λ)[β][1] |_(—) 10 if ψ mod 2 = 1 then 11 |_(—) recursivelyUpdateC(λ− 1, ψ)

Algorithm 16: SCL decoder, main loop Input: the received vector y and alist size L, as a global Output decoded codeword, ĉ // Initialization  1initializedDataStructures( )  2 l ← assignInitialPath( )  3 P₀ ←getArrayPointer_P(0, l)  4 for β = 0, 1, . . . , n − 1 do  5 |_setP₀[β][0] ← W(y_(β)|0), P₀[β][1] ← W(y_(β)|1) // Main loop  6 for φ = 0,1, . . . , n − 1 do  7 | recursivelyCalcP(m, φ)  8 | if u_(φ) is frozenthen  9 | | continuePaths_FrozenBit(φ) 10 | else 11 | |_(—)continuePaths_UnfrozenBit(φ) 12 | if φ mod 2 = 1 then 13 | |_(—)recursivelyUpdateC (m, φ) |_(—) // Return the best codeword in the list14 l ← findMostProbablePath( ) 15 set C₀ ← getArrayPointer_C(0, l) 16return ĉ = (C₀[β][0])_(β=0) ^(n−1)

Algorithm 17: continuePaths_FrozenBit(φ) Input: phase φ 1 for l = 0, 1,. . . , L − 1 do 2 | if pathIndexInactive(l) then continue 3 | C_(m) ←getArrayPointer_C(m, l) 4 |_(—) set C_(m)[0][φ mod 2] to the frozenvalue of u_(φ)

Algorithm 18: continuePaths_UnfrozenBit(φ) Input: phase φ  1 probForks ←new 2-D float array of size L × 2  2 i ← 0 // populate probForks  3 forl = 0, 1, . . . , L − 1 do  4 | if pathIndexInactive(l) then  5 | |probForks [l][0] ← −1  6 | | probForks [l][1] ← −1  7 | | else  8 | |P_(m) ← getArrayPointer_P(m, l)  9 | | probForks [l][0] ← P_(m)[0][0] 10| | probForks [l][1] ← P_(m)[0][1] 11 | |_(—) i ← i + 1 |_(—) 12 ρ ←min(2i, L) 13 ContForks ← new 2-D boolean array of size L × 2 // Thefollowing is possible in O(L) time 14 populate contForks such thatcontForks[l][b] is true iff probForks [l][b] is one of the ρ largestentries in probForks (and ties are broken arbitrarily) // First,kill-off non-continuing paths 15 for l = 0, 1, . . . , L − 1 do 16 | ifpathIndexInactive(l) then 17 | |_(—) continue 18 | if contForks[l][0] =false and contForks[l][1] = false | then 19 | |_(—) killPath(l) |_(—) //Then, continue relevant paths, and duplicate if necessary 20 for l = 0,1, . . . , L − 1 do 21 | if contForks[l][0] = false and contForks[l][1]= false | then // both forks are bad, or invalid 22 | |_(—) continue 23| C_(m) ← getArrayPointer_C(m, l) 24 | if contForks[l][0] = true andcontForks[l][1] = true then | // both forks are good 25 | | setC_(m)[0][φ mod 2] ← 0 26 | | l′ ← clonePath(l) 27 | | C_(m) ←getArrayPointer_C(m, l′) 28 | | set C_(m)[0][φ mod 2] ← 1 29 | | else//exactly one fork is good 30 | | if ContForks[l][0] = true then 31 | | |set C_(m)[0]φ mod 2] ← 0 32 | | else 33 | | |_(—) set C_(m)[0][φ mod 2]← 1 | |_(—) |_(—)

Algorithm 19: findMostProbablePath( ) Output: the index l′ of the mostpobable path 1 l′ ← 0, p′ ← 0 2 for l = 0, 1, . . . , L − 1 do 3 | ifpathIndexInactive(l) then 4 | | continue 5 | C_(m) ←getArrayPointer_C(m, l) 6 | P_(m) ← getArrayPointer_(——)P(m, l) 7 | ifp′ < P_(m)[0][C_(m)[0][1]] then 8 | |_(—) l′ ← l, p′ ←P_(m)[0][C_(m)[0][1]] |_(—) 9 return l′

The invention claimed is:
 1. A method of decoding data encoded with apolar code, the method being implemented via code stored on anon-transient medium in a receiver and comprising: receiving a word ofpolar encoded data from a channel and conducting decoding of the word byfollowing several distinct decoding paths to generate codewordcandidates; list decoding by successively duplicating and pruning saiddecoding paths to generate a list of potential decoding paths, selectinga single decoding path from the list of potential decoding paths andthereby identifying a single codeword output.
 2. The method of claim 1,wherein said duplicating and pruning splits a decoding path into twochild paths to be examined for each decision on the value of an unfrozenbit.
 3. The method of claim 2, wherein the list decoding assigns childpaths duplicate data structures each time a decoding path splits; eachassigned duplicate data structure is flagged as belonging to multiplepaths without copying the data structures at the time of assignment; anda copy of an assigned duplicate data structure is made only when aselected decoding path requires access to an assigned duplicate datastructure during decoding.
 4. The method of claim 1, wherein saidduplicating and pruning doubles the number of decoding paths at eachdecoding step, and then performs pruning procedure to discard all butthe L best paths.
 5. The method of claim 1, wherein the pruning isperformed using an accumulated likelihood of each path.
 6. The method ofclaim 1, implemented in a cellular network device.
 7. The method ofclaim 1, wherein the word of polar coded data includes k-r unfrozen bitsof k unfrozen bits as data bits and r bits of redundancy data; andwherein the pruning uses the redundancy data for decoding decisions toprune the list of potential decoding paths.
 8. The method of claim 7,wherein said pruning discards all paths with incorrect cyclic redundancycheck values.
 9. A method for encoding and decoding data using polarcodes, the method being implemented via code stored on a non-transientmedium in a receiver and comprising: reserving k-r unfrozen bits of kunfrozen bits available as data bits; using the remaining r unfrozenbits to add redundancy to the data bits; and then using said redundancyto aid in the selection of a decoding path from a list of decoding pathsgenerated during the decoding.
 10. The method of claim 9, wherein theredundancy bits are assigned to cyclic redundancy check (CRC) values ofthe data.
 11. The method of claim 10, wherein all the decoding pathswith incorrect cyclic redundancy check values are discarded.
 12. Themethod of claim 9, implemented in a cellular network device.
 13. A polardecoding device for receiving polar coded data from a channel, thedecoding device comprising: means for list decoding the polar codeddata; and means for successively duplicating and pruning decoding pathsduring decoding.